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| I, Joan T. Odell, am a citizen of the United States of America, residing at 1 27 
i Monitor Place, Unionville, Pennsylvania, United States of America, and I declare as 
| follows: 

' 1 . I am one of the co-inventors named in above-identified application. 

i 
i 

i 2. I received a B.A. degree in Biology from the University of California at San 
I Diego in 1 975. I received a Ph.D. degree in Biology from the University of California 
! at San Diego in 1 981 . I was a Postdoctoral Fellow at the Rockefeller University from 

! 1981 to 1985. 

i 

i 
i 
• 

j 3, I have been employed by E. I. du Pont de Nemours and Company from 

; 1985 to the present. From 1985 to 2001, 1 was a Principle Investigator, conducting 

i 

> and directing research in the areas of plant gene expression and genetic 

| engineering. From 2001 to 2003, I served as a Six Sigma specialist. From 2003 to 

i 

. the present, I have served as a patent liaison, I became a registered patent agent 
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iwith the United States Patent and Trademark Office in 2005 (Registration Number 
156,870). 

4. I have reviewed the Office Action dated October 1 9, 2005. I am aware that 
;this declaration is being submitted to illustrate work which I, or those working under 

■ my guidance, have done to demonstrate that the polypeptide set forth in SEQ ID 
;NO:36 (encoded by SEQ ID NO:35) functions as a Myb-related transcription factor in 
j a transgenic cell or plant. 

■ • 

I 5. An expression cassette was constructed for expression of the soybean 
I Myb-related transcription factor, "soyMyb2 " (SEQ ID NO:36; encoded by SEQ ID 
iNO:35). The soyMyb2 expression cassette is comprised the following elements: 
! (a) CaMV 35S promoter; 

| (b) Modified protein-coding region of SEQ ID NO:35; and 

; (c) Nopaline synthase (nos) 3' end. 

• The soyMyb2 expression cassette was constructed in the following manner: 
[The soyMyb2 sequence was isolated by PCR amplification of the clone, 
Isfl1.pk0105,e6, using the following two primers: 

(a) Forward Primer-1: CACAAGTTCATGAATAAAAAACAAC; and 
| (b) Reverse Primer-2: CAAACCCAATAATATGTTTTAA. 
: The Forward Primer-1 introduced a BspHI site, 5'-TCATGA, into the soyMyb2 

9 

isequence, which overlapped the start methionine codon (ATG). This primer 

j produced a point mutation in the soyMyb2 protein-coding region (PCR product). The 

! second amino acid of soyMyb2 was changed from aspartic acid to asparagine (GAT 

!-> AAT). The deduced nucleotide sequence of the soyMyb2 PCR product is 

j presented in Appendix B, which accompanies this paper. This nucleotide sequence 

lis comprised of the following segments: nucleotides 1-25, which correspond to the 

: nucleotide sequence of Forward Primer-1 (primer nucleotides 15-25 correspond to 

' nucleotides 34-44 of SEQ ID NO:35); and nucleotides 26-709, which correspond to 

■ nucleotides 45-728 of SEQ ID NO:35, which includes nucleotides 688-709, which 
j correspond to the reverse complement of the nucleotide sequence of Reverse 

! Primer-2, 

; The soyMyb2 PCR product was cloned into the vector, pCR™2.1-TOPO, 
i using TOPO™ TA. The resulting plasmid was called myb2-topo2.1. 
. j A nucleic acid fragment encoding soyMyb2 was then isolated from the 
j plasmid, myb2-topo2.1, using restriction enzymes BspHI and Kpnl. The Kpnl 
j cleavage site is within the pCR™2.1-TOPO vector sequence. The BspHI-Kpnl 
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fragment containing the soyMyb2 coding region was then cloned into plasmid DNA 
pMH40, which had been restriction digested with Ncol and Kpnl (the BspHI and Ncol 
; 5'-overhangs are identical). The modified soyMyb2 PCR product replaced the GUS 
: coding region that was present in the pMH40 expression vector. The resulting 
! plasmid was called myb305-2-pMH40. 

In plasmid myb305-2-pMH40, a CaMV 35S promoter was used to drive gene 
' expression of the soyMyb2 PCR product. The CaMV 35S promoter is a 1337-bp 
fragment, as illustrated in Appendix C r accompanied herewith. The nos 3'-end used 
to regulate transcription termination is a 761 -bp fragment, as illustrated in the 
accompanying Appendix D. 

The soyMyb2 expression cassette from plasmid myb305-2-pMH40 was 
cloned into a binary vector, pZBLNIN. The binary vector pZBLNIN contains right 
= and left T-DNA borders, as well as plant and bacterial expression cassettes, each 
containing a neomycin phosphotransferase II ("nptll") gene for selection using 
kanamycin. The resulting vector, myb305-2-pZBL1N, was transformed into 

> 

Arabidopsis. 

Arabidopsis that were successfully transformed with the soyMyb2 expression 
; cassette produced purple seedlings. The cotyledons, hypocotyls, and older parts of 
'the roots also displayed purple color. The adult plants looked normal but exhibited 
slower growth and produced fewer and poorer seeds. 

The red and purple pigmentation in plant tissues is due to secondary 
i metabolites called anthocyanins, which are produced through the flavonoid pathway, 
a branch of the general phenylpropanoid pathway. As an example, this pathway is 
: discussed in Uimari et al., 1997 Plant J 1 2: 1 273-1 284. A copy of Uirnari et al. is 
' provided herewith. 

These results indicated that expression of the soyMyb2 chimeric gene 
; activated the anthocyanin pathway in transgenic Arabidopsis. 

* 

6. An expression cassette was constructed for expression of a chimeric Myb- 
related transcription factor, w soyMyb2-PvALF ". PvALF (Gl No. 1046278) is an ABI- 
: like transcription factor from Phaseoius vulgaris. The PvALF transcription factor has 
■ a transcription activation domain at the amino-terminus (Bobb et al., 1995 Plant J 
5:101-1 13). The soyMyb2-PvALF expression cassette comprised the following 
elements: 

(a) CaMV 35S promoter; 

(b) Nucleotides 29 - 442 of SEQ ID NO:35, which encode the DNA- 
binding domain of soyMyb2 (amino acids 1 - 138 of SEQ ID NO:36); ■ 
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! (c) 241 amino acid transcriptional activation domain of PvALF; and 

i (d) Nopaline synthase (no$) 3'-end. 

> 

1 The soyMyb2-PvALF expression cassette was constructed in the following 
[manner: 

iThe DNA-binding domain of the soyMyb2 gene was isolated by PCR amplification of 
:the clone sfl1.pk0105.e6, using the following primers: 
! (a) Forward Primer-3: 

! TGTCACCATGGATAAAAAACAACAGTGTAAQACGTC; 
; (b) Reverse Primer-4: 

! TTTG GACCCCGG G A ATTCGTG ATCATTTATCTCAGA ATTATTACTAC 

i 

i 

TC 

: The Forward Primer-3 provided a Ncol recognition site (5'-CCATGG), overlapping 
'the start methionine codon. The Reverse Primer-4 provided Smal (5'-CCCGGG) 

> 

;and EcoRI (5'-GAATTC) recognition sites. The deduced nucleotide sequence of the 
! resulting PCR product, DNA-binding domain of SoyMyb2 r is presented in Appendix 
, E, a copy of which is attached hereto. This nucleotide sequence is comprised of the 
| following segments: nucleotides 1-36, which correspond to the nucleotide sequence 
lot Forward Primer-3; nucleotides 37-390, which correspond to nucleotides 58-411 of 
jSEQ ID NO:35; and nucleotides 391-439, which correspond to the reverse 
[complement of the sequence of Reverse Primer-4. 

■ The PCR product was cloned into the Promega pGEM™-T Easy vector (AT- 

• 

I tailed). The resulting plasmid DNA was called pGMBD-5. 

; The plasmid 108G4Alf contains the phaseolin promoter, Ga)4~PvALF fusion 
» protein, and phaseolin 3'-end, The use of the transcriptional activation domain of 
! PvALF in the construction of chimeric transcription factors has been previously 
! described (U.S. Patent No. 5,968,793; Example 2). A nucleic acid fragment from 
I pGMBD-5 that contained the soyMyb2 DNA-binding domain was obtained by 
j restriction digesting with Ncol and EcoRI. This fragment was cloned into plasmid 

■ 

; 108G4Alf that also had been restriction digested with Ncol and EcoRI. This resulted 
| in replacement of both the Gal4 DNA-binding domain and the 5' region of the PvALF 
: activation domain (an EcoRNEcoRI fragment) of plasmid 108G4Alf with the soyMyb2 
; DNA-binding domain. The resulting plasmid was named p108MBD. It contained the 
; intact soyMyb2 DNA-binding domain and the 3' region of the PvALF transcription 
: activation domain. 

| Plasmid p108MBD was restriction digested with EcoRI. The 5' region of the 

: PvALF activation domain was inserted as an EcoRI-EcoRI fragment from the 

\ 
i 

i 

» 
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j plasmid 35S-G4Alf, The resulting plasmid DNA, with an intact soyMyb2 DNA- 
| binding domain and an intact PvALF activation domain, was called 108MybA. 
j The deduced nucleotide sequence of the chimeric soyMyb2-PvALF protein- 
| coding region is presented in Appendix F, a copy of which accompanies this paper. 
: This nucleotide sequence is comprised of the following segments: nucleotides 1-420, 
j which correspond to nucleotides 8-427 of the soyMyb2 DNA-binding domain PCR 
jproduct (presented in Appendix E); nucleotides 421-426, which correspond to the 
Smal linker preceding PvALF in plasmid p108G4Alf; nucleotides 427-1129, which 
correspond to nucleotides 41-743 of Gl No. 1046277 (PvALF); nucleotides 1130- 
1 1 155, which correspond to the reverse complement of nucleotides 4-29 of the PvALF 
IPCR primer, Alf6-Sall (described in Bobb etaL, 1995 Plant J 8:101-1 13); nucleotides 
j 1 156-1 167, which correspond to nucleotides 2086-2097 of the linker region of 
plasmid p108G4Alf; and nucleotides 1168-1176, which correspond to nucleotides 
'3287-3295 of plasmid pML63, the sequence that immediately precedes the nos 3'- 
;end. 

Plasmid pML63 contains an expression cassette containing the following 
elements: a CaMV 35S promoter; a GUS coding sequence; and a nos 3'-end. The 
GUS coding region of pML63 was replaced with the soyMyb2-PvALF coding region 
i in the following manner. The soy My b2- PvALF coding region was isolated from 
» plasmid 108MybA as two restriction fragments, an Ncol-Bglll fragment (for the 5' 
: region) and a Bglll-Smal fragment (for the 3 7 region). These two fragments were 
cloned into plasmid pML63 that had been digested with Ncol and Smal. The 
' resulting plasmid, containing a soyMyb2-PvALF expression cassette, was called 

j p35MybA. 

I The CaMV 35S promoter present in p35MybA is a 1404 nucleotide fragment, 
: which is presented in Appendix G, accompanied herewith. 
; The nos 3'-end region present in p35MybA is a 279 nucleotide fragment, 
I which is presented in Appendix H, accompanied herewith. 

An Xbal fragment from p35MybA was cloned into the binary vector pZBLNI N. 
The resulting plasmid, pB35MybA, was used to transform Arabidopsis. 

Arabidopsis that were successfully transformed with the soyMyb2-PvALF 
I expression cassette produced purple seedlings. This purple color was slightly more 
' intense than that observed for the transgenic soyMyb2 seedlings described above. 
. The cotyledons, hypocotyls, and older parts of the roots were all purple. The adult 
! plants looked normal but exhibited slower growth and produced fewer and poorer 
; seeds. 
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These results indicated that expression of the novel soyMyb2-PvAif chimeric 
gene activated the anthocyanin pathway in transgenic Arabidopsis. 

\ In summary, the results of the transgenic Arabidopsis studies with soyMyb2 
land the chimeric polypeptide soyMyb2-PvALF is believed to show that the 
! polypeptide set forth in SEQ ID NO:36 exhibits Myb-related transcription factor 
i activity. The activation of the anthocyanin pathway in the transgenic Arabidopsis 
plants was evidenced by the purple color of the Arabidopsis seedlings produced by 

r 

;the transformed Arabidopsis plant 

I declare further that all statements made herein of my own knowledge are 
itrue and that all statements made on information and belief are believed to be true, 
\ and further that these statements are made with the knowledge that willful false 

■ 

.statements and the like so made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States Code and that such willful false 
statements may jeopardize the validity of the application or any patent issuing 
thereon. 
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Andrew J. Bobb, Hans Goorg Eiben and 
MauricBo M. Bustos 0 

Department of Biological Sciences, UMBC, 

5401 W Ukens Ave., Baltimore, MD 21228-5398, USA 

Summary 

Mutations in Vp 1 and A$l3$<zm$ off maiz® and Arabidopsis 
lead to drastic reductions in ttHto synftfoosts of a subset 
olf maturation-specific products including seed storage 
proteins. Gene Phaseoius vulgaris A©E3-I5k© factor [PvAtf), 
whose protein product is similar to the AB83 and Vpl 
proteins, has been cloned. Here, it is shown that PvM 
positively regulates phaseolin and phytohemaggdutinin 
(PHA-L) promoters in particle bombardment assays. PVAIf 
mRNA expression is ennbryo-speciffiic and temporally 
complex. PvAlf roiRWA aflxundance is highest during two 
periods (SM4 and 22-35 days after flowering) that precede 
the onsets of seed maturation and seed abscossion, 
respectively. Protein fusions with the DNA-bSndong domain 
of the yeast transcriptional activator GALS demonstrated 
that the N-terminal 243 amino acids of PvABf function 
as a strong transcriptional activation domain in yeast 
(Saccharomyc&s caravisiae) and pliant cells. This domain 
consists of a central cluster rich on serine, threonine and 
proline (STP cluster) tffianfced by two negatively charged 
regions containing bulCty hydrophobic residues similar to 
acidic activation domains of Vpl, the herpes simpfleu virus 
virion protein VP16 and transcription factors GCN4 and 
HAP4 from yeast. Together with the VpH proteins of maize 
and rice and ABI3, PvAlf constitutes a class (Vp1/ABD3-0ike 
factors or VAtffs) of regulatory factors that are pivotal 
for the promotion of seed maturation and dormancy in 
angiosperms. 

(Introduction 

When cotyledon-stage embryos of many plant species are 
cultured in water or in simple nutrient solutions they 
undergo germinative changes normally seen only during 
rehydration (imbibition) of mature, dry seeds (reviewed in 
Crouch, 1987; Galau etal. t 1991). Initial changes can occur 
in the presence of inhibitors of RNA synthesis (Long et ai, 
1981) suggesting that plant embryos are already capable 

Received 16 January 1995; revised 19 May 1995; accepted 5 June 1995. 
For correspondence (fax + 1 410 455 3875). 



of germinating by the end of the cotyledon stage. However, 
precocious germination in planta (vivipary) is rare; instead, 
embryogeny proceeds into a maturation phase character- 
ized by abundant expression of a limited set of specific 
genes {MAT genes) encoding storage proteins, lectins, 
oil body proteins, enzymes involved in lipid and starch 
metabolism, desiccation protectants and defense enzymes 
(glucanases, chitinases, amylases, etc.). In angiosperms 
(monocots and dicots), normal seed maturation and sub- 
sequent dormancy are disrupted by aba and viviparous 
mutations that affect the biosynthesis of the phyto hormone 
abscisic acid (Koorneef ef al, 1982; Neiil et a/., 1986; 
Robertson, 1955). Mutations that reduce the sensitivity of 
seed tissues to abscisic acid, such as abi3 in Arabidopsis 
tbaliana and vpl in maize, exhibit similar, albeit complex, 
developmental phenotypes that include premature 
germination (Robertson, 1955), insensitivity to abscisic 
acid (Koorneef et ai, 1984) and reduced storage protein 
accumulation (Kriz ef a/., 1990; Nambara etal, 1992) among 
other traits. However, vpl alleles show a deficiency in 
pigmentation (Hattori et ai, 1992; Neill et a/., 1986; 
Robertson, 1955) that is not observed in abi3 mutants and, 
therefore, it is not clear whether they represent the same 
gene. Most relevant to this work are the dramatic effects 
that mutations in ABI3 and Vpl genes have on MAT gene 
expression (Koorneef et al, 1989; Nambara et al., 1992, 
1994; Paiva and Kriz, 1994; Pang et al., 1988; Parcy ef ai, 
1994; Pla ef ai, 1991); the strongest abi3 mutant alleles 
[abl3-4 and abi3-6i cause a near complete loss of 2S 
and 12S storage protein expression in Arabidopsis seeds 
(Nambara ef ai, 1994; Parcy ef ai, 1994), and a similar 
phenotype has been observed with respect to expression 
of a maize globulin gene (GlbV in vpl null seeds (Kriz 
ef ai, 1989). The ABI3 gene was isolated by map-based 
positional cloning (Giraudat ef ai, 1992) and shown to 
encode a protein similar to the product of Vpl (McCarty 
etal, 1991). Vp1 and the equivalent gene from rice, OsVpl, 
can activate gene expression from Em and C1 promoters 
in maize endosperm protoplast assays (Hattori ef a/., 1992, 
1994; McCarty ef al, 1991). The transcriptional activation 
domain of Vp1 was localized within the first 121 amino 
acids at the N-terminus (McCarty ef ai, 1991). Although 
similar information has not been reported for ABI3, Parcy 
et al (1994) showed that ectopic expression of ABI3 and 
exposure to high levels of ABA in transgenic Arabidopsis 
leaves leads to activation of genes that are normally 
expressed only in siliques, also consistent with a role for 
ABI3 as a transcriptional activator. 
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In bean, phaseolin (a 7S seed storage protein) and 
phytohemagglutinin (PHA, a lectin) are the most abund- 
antly expressed seed proteins, together accounting for 70- 
80% of the total protein content of a mature embryo. 7S 
and 11S storage proteins are also major constituents of 
soybean, pea and Vicia faba seeds (Casey et al., 1986). 
Lectins, too, represent important components of many 
dicot seeds and some lectin-related proteins have been 
found to possess insect deterring properties (Osborn etal., 
1988). Phaseolin and lectin genes are known to be seed- 
specific, appear to be coordinately regulated during 
maturation (Murray and Kennard, 1984; Staswick and 
Chrispeels, 1984), and provide useful molecular markers 
to investigate the control of seed maturation in bean. Here 
we report cloning gene Phaseolus vulgaris ABI3-iike factor 
(PvAlf), whose protein product is related to ABI3 and Vp1. 
PvAlf activates both phaseolin and PHA promoters in 
cotyledon cells. PvAlf mRNA was found to be embryo- 
specific and expressed during two different periods of 
embryogeny: the first one precedes the induction of 
phaseolin and PHA expression and the onset of maturation, 
and the second coincides with predesiccation and desicca- 
tion stages. Ectopic expression of PvAlf in leaves is suffi- 
cient to activate transient expression from phaseolin and 
PHA promoters suggesting that other seed-specific factors 
are not required for PvAlf-mediated activation. When 
bound to a nearby promoter site, PvAlf activates transcrip- 
tion via a complex N-terminal, acidic domain. These results 
demonstrate a clear role for PvAlf as a positive transcrip- 
tional regulator of maturation-specific genes in bean and 
for related VAIfs in legumes. 

ResuOts 

Cloning of Phaseolus vulgaris AB13-Iike factor from 
French bean 

Evidence from physiological, genetic and molecular studies 
points to the involvement of the abscisic acid insensitive- 
3 (ABI3) gene of A. thaliana in the promotion of seed 
maturation and its specific program of gene expression 
(Finkelstein and Sommerville, 1990; Koorneef et a/., 1984). 
A method based on the rapid amplification of cDNA ends 
(3'-RACE) technique (Frohman et al., 1988) was used to 
clone an ABI3-like mRNA from developing bean embryos. 
Briefly, degenerate oligonucleotide primers were synthe- 
sized corresponding to the peptides MEDIGT and VWNMRY 
conserved in ABI3, Vp1 and OsVpl. Total cellular 
poly(A) + RNA from mid-maturation bean embryos was 
reverse-transcribed using an oligo-dT primer and Super- 
script™ (BRL-GIBCO) reverse transcriptase and the 
resulting cDNA mixture was used as template for two 
nested polymerase chain amplification reactions with Taq 
DNA polymerase and the degenerate primers (Experi- 



mental procedures). The products from the second ampli- 
fication were directly cloned into a plasmid vector 
(pCR2000, Invitrogen) and those encoding an ABI3-like 
open reading frame (ORF) were identified by di-deoxy 
sequencing. The middle portion of the ABI3-like ORF was 
amplified using a degenerate oligonucleotide that corre- 
sponded to the conserved pentapeptide LPDFP found near 
the N-termini of Vp1 and ABI3, and two non-degenerate, 
gene-specific primers selected from the sequence of the 
3'-RACE clone. After two nested PCR reactions a product 
with the expected size, 1.8 kbp, was obtained, cloned and 
sequenced. Finally, the 5'-end of the ABI3-like ORF was 
obtained by 5'-RACE (Frohman etal., 1988), this time using 
gene-specific primers deduced from the intermediate, 1.8 
kbp clone. After identifying sequences that appeared to 
include a translation start codon, new 5' and 3' gene- 
specific primers were synthesized and used to re-amplify 
the complete ORF from fresh poly(A) + RNA yielding clone 
pPvAlf. The longest open reading frame in pPvAlf encodes 
a 752 amino acid protein termed Phaseolus vulgaris ABI3- 
like factor (PvAlf). One of the partial cDNA clones isolated 
had a 9 bp duplication (arrowhead, Figure 1) that extends 
the PvAlf ORF by 3 amino acids and is likely to represent a 
different PvAlf allele. Only three single base pair differences 
were recorded out of more than 5.4 kbp of redundant PvAlf 
sequences. Even if all three were artifacts of PCR, an 
unlikely coincidence since they are all silent mutations, 
this result still indicates a high degree of accuracy (greater 
than 99.8% identity) in the reported PvAlf sequence, similar 
to those in public nucleotide databases. In Figure 1, the 
sequence of PvAlf (from clone pPvAlf) is aligned to ABI3 
(AtABI3), maize Vpl (ZmVpD and rice Vp1 (OsVpl). The 
alignment was done using the programs pileup and gap 
(University of Wisconsin Genetics Computer Group). Over- 
all, pair-wise identities for the group were low: PvAlf- 
ABI3 = 48%, PvAlf-Vp1 - 38% and PvAlf-OsVpl = 41%. 
As noted before (Giraudat etal., 1992; Hattori etal., 1994), 
the identity is concentrated on four domains corresponding 
to positions 80-126 (domain I), 281-338 (domain II), 533- 
547 (domain III) and IV, 655-772 (domain IV) on Figure 1. 
We propose naming this class of proteins Vp1/ABI3-like 
factors (VAIf), after Vp1 and ABI3, the first two members 
to be cloned. VAIf domain IV is the most conserved (84% 
overall sequence identity) suggesting that it performs an 
important function in monocot and dicot species. In Vp1, 
domain I is included within an N-terminal transcriptional 
activation domain. The homology of this region to corres- 
ponding portions of PvAlf and ABI3 is restricted to the runs 
of multiple serine residues. 

PvAlf is encoded by a single-copy gene 

In Arabidopsis, ABI3 is a single-copy gene expressed only 
in seeds (Giraudat et al, 1992; Parcy et al., 1994). We 
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OsVpl MDASACSSAP HSHGNPCKQG CGGGGGGGRG KAPAAEIRGE AARDDVFFAD DTFP LLPDFPCXSS PS88TFI... 

i ii in inn I i ii i i ii ii ii mi tii 

ZmVpl MEA5SCSSPP HSQENPPEHG GDMGG APAEEIGGE AA.DDFMFAE DTFP StPDFPCLSS PS8BTP8. . . 

II II M 1 1 1 1 1 1 II Ml I 

PvAlf MECEVKLKGG DLKAEGVTET NAVG FOAMED ECTLT. .VAB . . . REMWLNS OQDEFLGVN EASMFYANPP FLPDPPCT88 85888. 8AAP 

|| | | || || I I II I I I I I I I II Mill I II 

AtABI MKSL HVAANAGDLA EDCGILGGDA DDTVXMDGID EVGRSIWLDD HGGDNNHVHG HQDDDLIVHH DPSIFYGDLP T1FD7PCM88 SS888T8PAP 
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OsVpl 88 888M88 SAFTTA AGGGCGCEPS BPASAADGFG BLADIDQ 

| MIMI M I II Mill Mill II I (III 

zmvpi sv suns say-tut agra.ggbps epasagegfd alddidq 

I III II IM I I 

PvAlf LPLKTTTCST TTTATTAT88 88S8SSHAVL KSDVEEEOVB KNKCNGSMQO QPDATALSST ASMEISQQQtJ PDPGLGGSVG ECMBDVMDTF GYMBLLEAND 
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AtABI V. . . NAIVSS ASSSSAA888 TS5AA8MAIL R80GEDPTPN QNQYASGNCD D . SSGALQST ASHE IP LOSS QGFGCGBGGG DCI.DMMETF GYMDLLDSNE 
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OsVpl LLDLASLSVP WBAEQPLFPD DVGMMIBDAM SGQPHQAD. . . . DCTGDGDT 

Ml MM III II I III IM I II 

ZmVpl LLDPASLSMP HDSEP. .PP. GVSMMLBNAM SAPPQPVQ. . ..D..GMSBB 

I II II Ml 

PvAlf PPDPASIP. . .QNSBSBDPL IEPGVLEEQV SLQEBQHBMV HOQBMTBBDR 

III II I I II I I 

AtABI FPDTSAIF. . SGDDDTONP. - . .NIXDQTL ERQBDQWVP WBNNSGGDM 
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KAVMBAAGGG DDAGDACMEG SDAPDDLPAF PHBILTSHU YISABDLMI 

III I II III I M I IMIIIMM III III I 

KAVPEGTTGG E...EACMDA SEG.EBLPRP PMWLT8W* MZ8ASDLKQZ 

I I I I I II II II III 

KVPVCBVIKG BEEGGGGGGG RWDDEMSNV PXSMIKSMXS SVSAMDUUfV 

I II Mill M III Ml I 
CttfflS SLBQ DDDLAAV TlXXimKM TVSASXtLRKV 
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Figure 1. Gene PvAlf encodes a protein similar to the Vpl and ABI3 factors of maize and Arabidopsis. 

The deduced amino acid sequence of PvAlf was aligned to the sequences of ABI3 (Giraudat ef a/., 1992), maize Vpl (ZmVpl, McCarty et al, 1991) and rice 
Vp1 (OsVpl, Hattori ef ai, 1994) using the computer program piueup (University of Wisconsin Genetics Computer Group). Identical amino acids are indicated 
with vertical bars. The most conserved areas (domains MV) are highlighted in boldface type. The location of a 9 bp insertion coding for the tripeptide 'SNN' 
found in a different PvAlf clone is indicated by an arrowhead. 
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Figure 2. PvAff is encoded by e single-copy gene. 
Total genomic DNA isolated from etiolated bean seedlings (Experimental 
procedures) was size-fractionated by electrophoresis on a 0.75% agarose/ 
TAE gel either uncut (U) or after restriction digestion with Xbeli (X), BanM 
(B) or Hind\\\ (H) After transfer to a nylon membrane, PvAlf sequences were 
detected by hybridization to a PvAlf ^P-labeled probe and autoradiography. 



investigated the number of PvAlf genes in bean using biot 
hybridization to genomic DNA digested with the restriction 
enzymes Xba\, BamH\ and W/ndlll. A sample of uncut DNA 
was included as a control DNA blots were hybridized to a 
radiolabeled PvAlf probe under high-stringency conditions. 
The autoradiograph on Figure 2 shows the presence of 
a single PvAlf gene copy in bean. An additional cross- 
hybridizing band was detected at lower stringency, 
although it is unclear whether it represents a second gene 
bearing a low degree of homology to PvAlf. 

PvAlf expression is seed-specific and developmental^ 
regulated during embryogeny 

The organ and temporal distribution of PvAlf mRNA expres- 
sion were analyzed by RNA blot hybridization. Total cellular 
RNA was isolated from leaves, roots, seed pods, callus 
and eight stages of 'cotyledons' (cotyledons plus embry- 
onic axis) ranging from early cotyledon to pre-desiccation 
collected between 9 and 35 days after the opening of the 
corollas (days after flowering or DAF). Self-pollination 
occurs in bean before the corollas open. The same filter 
was hybridized sequentially to PvAlf, phaseolin, phyto- 
hemagglutinin (PHA-L) and 18S rRNA (rRNA) probes. The 
relevant portions of each autoradiograph are shown in 
Figure 3. PvAlf mRNA expression was detected only in 
cotyledons. Even after prolonged exposure no signals 
could be seen in the lanes for leaf (L), root (R), seed pod 
(P) and callus suspension culture (C), all of which were 
loaded with the same amount of RNA (see rRNA panel). 
PvAlf mRNA was also absent from senescing leaves and 
flowers (data not shown), the developmental series of 
cotyledon stages revealed that the steady-state level of 
PvAlf mRNA was modulated during embryogeny. After the 
9 DAF stage, PvAlf mRNAs segregated into two different 
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Figure 3. PvAlf expression is embryo-specific and developmentally 
regulated during embryogeny. 

Total cellular RNA was isolated from leaves (L), roots (R), seed pods (P), 
callus suspension culture (C) and 'cotyledons' (cotyledons plus embryonic 
axis) at 9, 10, 12, 14, 18, 22, 27 and 35 days after flowering (DAF). The 
same amount of each RNA ( 10 ug per lane) was separated on a formaldehyde 
denaturing agarose gel, transferred to a nylon membrane and sequentially 
hybridized to ^P-labeled probes specific for PvAlf, phaseolin, PHA-L and 
18S rRNA. Exposure times varied from 2-4 h for phaseolin, PHA and 18S 
rRNA to 4 days for PvAlf. 



size classes which increased in their abundance until 
approximately 14 DAF. This was followed by an interval of 
decreased expression lasting at most 6 days. A second 
period of increased expression was observed between 22 
and 36 DAF. By contrast, the maturation markers phaseolin 
and PHA-L were induced at 10 DAF and their mRNAs were 
highest during the 14-22 DAF period. This is in agreement 
with expression profiles reported previously for both genes 
(Murray and Kennard, 1984; Staswick and Chrispeels, 1984). 
From this analysis we conclude that PvAlf mRNA expres- 
sion is regulated in a complex manner during bean 
embryogeny. The first wave of PvAlf mRNA induction 
precedes the onset of maturation-specific gene expression 
by at least 2-3 days; a second wave of PvAlf mRNA 
expression occurs later in embryogeny, after the abund- 
ances of maturation-specific mRNAs begin to decrease. 
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Figure 4. Constructs used in PvAlf frans-activation assays. 
Promoters and upstream DNA regulatory sequences driving the uidA gene 
encoding p-glucuronidase: Phaseolin p -GUS, contains a seed-specific {J- 
phaseolin promoter (-302 to +20) comprising upstream activating 
sequence-1 (UAS1, -302 to -106) end TATA region (Bustos et a/., 1991); 
PHA/35S P -GUS, has the PHA -247 to -$5 enhancer (Riggs et a/., 1989) fused 
to a truncated CaMV 35S promoter (-64) Both genes have 0-phaseolin 
polyadenytation and 3'-f Ian king sequences (Phaseolin 3') The effector 
plasmid pJIT-PvAlf contains a PvAlf cDNA inserted in the polylinker region 
of pJIT82 between CaMV 35S promoter (-900 to +1) and polyadenylation 
(CaMV 3') sequences. 

Gene PvAlf activates seed maturation-specific promoters 
in bean 

A promoter trans-activation assay based on particle 
bombardment of bean cotyledon tissues with recombinant 
DNA was used to explore the relationship between PvAlf 
and phaseolin or PHA-L expression. Figure 4 depicts the 
structures of two reporter constructs designed to monitor 
transient gene expression: Phaseolin p -GUS consisted of 
a p-phaseolin gene fragment (-302 to +20) defined 
previously as a minimal seed-specific promoter in trans- 
genic tobacco plants (Bustos et al. t 1991), driving the uidA 
gene from Escherichia coli that encodes ^-glucuronidase 
(GUS); PHA/35S P -GUS, consisted of an upstream PHA-L 
promoter fragment (-247 to -65) fused to a CaMV 35S 
TATA-containing fragment (-64 to +1) also driving the uidA 
reporter gene. The -247 to -65 PHA-L fragment is necessary 
for seed-specific expression in tobacco (Riggs etaL, 1989). 
The PvAlf effector plasmid pJIT-Alf contained a full-length 
PvAlf cDNA (encoding amino acids 1-753) under the control 
of a CaMV35S promoter (-900 to +1) and 35S termination 
signals <CaMV-3') of plasmid pJIT82 (a gift from 
Dr D. Helinski, UCSD, La Jolla, CA). To control for bombard- 
ment efficiency parallel experiments were carried out with 
construct pJIT-GUS containing the uidA gene in vector 
pJIT82. 

Reporter and effector plasmids were introduced into 
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Figure 5. PvAlf activates expression from promoters containing phaseolin 
and PHA regulatory sequences in cotyledon and leaf tissues. 
Reporter and effector plasmids were introduced into bean cotyledon and 
leaf cells by microprojectile bombardment using a Helium-driven apparatus 
(Bio-Radh Bars represent the mean of five to seven individual 
bombardments ± SE. 

{a) Phaseolin reporter gene. Plasmid Phased in p -G US was bombarded 
together with an equal amount of either pJIT82 {empty bars) or the effector 
pJIT-PvAlf (cross-hatched bars). To estimate transfection efficiency, pJIT- 
GUS was bombarded with and equal amount of pJfT82 (vertical hatch, 
(b) PHA reporter gene. Conditions were as in (a), except that plasmid PHA/ 
35S P -GUS was used as reporter. 



cotyledon cells using a Helium-driven Biolistic particle 
delivery apparatus (Bio-Rad). Figure 5(a) presents data on 
rrans-activation of the Phaseolin p -GUS reporter in coty- 
ledons at two different stages, 10-14 and 13-16 mm, 
corresponding to age ranges of 14-22 DAF (median 18 
DAF), and 21-25 DAF (median 23 DAF). The first cotyledon 
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stage was chosen to match the period of maximal abund- 
ance of endogenous phaseotin and PHA mRNAs, while the 
later stage corresponded to the period of decline in both 
types of mRNA. In every case, GUS activities are expressed 
in picomols (pmol) of 4-MU produced per hour and were 
calculated after subtracting background values from identi- 
cal tissue samples that had not been subjected to particle 
bombardment. Ail bars represent the average of five to 
seven repetitions. A 1:1 ratio of effector to reporter plasmid 
DNA produced significant and consistent enhancements 
{indicated in boldface type) in reporter gene activity. The 
effect was more pronounced (fivefold versus twofold) in 
10-14 mm cotyledons, although this difference may be the 
result of higher activity of the CaMV 35S promoter in 
younger embryos. To determine whether PvAlf activation 
of the phaseolin promoter requires the presence of other 
seed-specific factors, the same experiment was performed 
on primary leaves (3-5 days into germination) and mature 
leaves. In both cases, a large net activation was observed 
in the presence of PvAlf although the effect was relatively 
stronger (21-fold versus fivefold activation) in mature 
leaves. 

Similar experiments were performed with the PHA 
reporter gene on 11-14 mm cotyledons and mature leaves 
(Figure 5b). The results were nearly identical to those 
obtained with the phaseolin driven reporter indicating 
that both promoters are targets for activation by PvAlf. 
Moreover, control experiments showed that PvAlf has no 
effect on a minimal (-64 to +20) phaseolin promoter 
fragment or on the CaMV 35S promoter (data not shown) 
indicating that the effect requires specific c/s-acting ele- 
ments present in the upstream enhancers of phaseolin and 
PHA genes. From these experiments we conclude that 
transient gene expression of reporter genes driven by 
maturation-specific promoters can be activated by recom- 
binant PvAlf. Moreover, if additional factors are required 
for PvAlf-mediated activation, they must be present in 
leaves as well as embryos. 

The PvAlf protein contains a transcriptional activation 
domain near its N-terminus 

Although recombinant PvAlf appeared to function as a 
positive regulator of maturation-specific promoters, it was 
not immediately obvious whether it acted at the level of 
transcription or as a regulator (e.g. protein kinase) of 
endogenous transcription factors. McCarty ef al. (1991) 
demonstrated the presence of a transcriptional activation 
domain near the N-terminus of Vp1; however, the homo- 
logy of the N-terminal regions of PvAlf and Vp1 is so low 
(see alignment on Figure 1) that it would be unwise to 
assume that PvAlf is also active in transcription. Therefore, 
we ascertained whether tethering PvAlf to an upstream 
location in a heterologous promoter would lead to tran- 
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Figure 6. N-terminal fragments of PvAlf function as transcriptional activation 
domains in yeast. 

(a) Diagrams represent fusions between the DNA-binding domain (amino 
acids 1-147) of GAL4 and PvAlf polypeptides. DB, DNA-binding domain; 
AO, GAL4 transcriptional activation domain. Cross-hatched boxes indicate 
PvAlf sequences. Numbers on the left show the length of PvAlf peptides 
{in amino acids) included in each fusion. 

(b) f}-galactosidase activity (Miller units) in yeast cultures transformed with 
GAL4:PvAlf expression plasmids. 



scriptional activation in vivo. To that end, three PvAlf ORFs 
coding for N-terminal polypeptides of increasing length 
(117, 243 and 756 amino acids) were fused to the DNA- 
binding domain of the yeast transcription factor GAL4 
(amino acids 1-147, Ma and Ptashne, 1987). The GAL4 
DNA-binding domain (GAL4DB) recognizes a specific, 17 bp 
(17-mer) consensus sequence but is insufficient to activate 
transcription of its own. Figure 6(a) depicts these gene 
fusions along with a complete GAL4 gene. These plasmids 
were used to transform yeast strain SFY526 harboring a 
chromosomal copy of the GAL4-responsive gene GAL1: 
lacZ (Bartel et a/., 1993). Transformants were first tested 
for induction of p-galactosidase activity using a filter assay. 
Individual colonies were subsequently grown in liquid 
culture and p-galactosidase activity was quantified by a 
colorimetric assay (Miller, 1972). Figure 6(b) shows a plot 
of the amount of p-galactosidase activity (expressed in 
Miller units) produced by each fusion gene. As expected, 
the GAL4 DNA-binding domain alone was unable to activ- 
ate lacZ expression. Addition of the first 117 amino acids 
of PvAlf increased expression by 10- to 15-fold over the 



GAL4DB value. Although low, this degree of activation was 
highly reproducible. Adding PvAlf amino acids 118-243 
increased activation by another order of magnitude to 
approximately 140 times the GAL4DB value. Inclusion of 
remaining amino acids at the C-terminus of PvAlf yielded 
a very small and more variable increase in (3-galactosidase 
activity. These experiments demonstrated that PvAlf activ- 
ates transcription when targeted to a promoter upstream 
location, and that the N-terminal 243 amino acids of PvAlf 
function as an efficient transcriptional activation domain 
in yeast. 

The function of the PvAlf N-terminal activation domain 
was confirmed in bean cotyledon cells using the particle 
bombardment procedure. For that purpose, a reporter gene 
was used that contained a synthetic, GAL4-responsive 
promoter driving the gene for chloramphenicol acetyl 
transferase (CAT). This construct, a generous gift from 
Dr Jun Ma, is designated as GAL4 n7mers) :TATA-CAT in 
Figure 7; it contains 10 copies of a 17 bp consensus GAL4- 
binding site fused to the CaMV 35S TATA fragment (Ma 
and Ptashne, 1987). A version of this promoter lacking 
GAL4-binding sites (TATA-CAT) was used as a negative 
control. Plant effector plasmids GAL4::PvAlf (1 . 117) and 
GAL4::PvAlfd_243) were based on the same vector (pJIT82) 
used to demonstrate frans-activation by recombinant PvAlf 
protein (Figure 4). After a 24-36 h period of incubation, the 
tissue was homogenized and CAT activity quantified using 
[ 14 C]chloramphenicol and acetyl-CoA (Fromm et a/., 1987). 
Autoradiographs of typical TLC separations are shown in 
Figure 7(a) The amount of radioactivity corresponding 
to chloramphenicol and acetylated chloramphenicol was 
quantified using a Phosphorimager (Molecular Dynamics). 
Corresponding CAT activity values are displayed in 
Figure 7(b). A small increase in CAT expression (4-fold) 
was observed with the GAL4~PvAlf 0 _ 117) construct. By 
contrast, construct GAL4-PvAlf n _ 2 43> caused a very large 
increase (71-fold) in the activity of the reporter gene. Both 
effects were observed only with the construct containing 
GAL4-binding sites, demonstrating that binding of the 
GAL4::PvAlf fusion proteins to the promoter was a pre- 
requisite for gene activation. These results were entirely 
consistent with those obtained previously in yeast and 
confirmed the function of the PvAlf N-terminal region as a 
transcriptional activation domain. 

The PvAlf N-terminal transcriptional activation region 
resembles acidic domains ofeukaryotic transcription 
factors GCN4, HAP4 and VP16 

Figure 8 compares the amino acid sequences of the activa- 
tion domains of plant VAIf proteins PvAlf and Vp1, with 
analogous activation domains of yeast factors HAP4 
(Forsburg and Guarente, 1989) and GCN4 (Hope and Struhl, 
1986) and the herpes simplex virus (HSV) virion protein 
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Figure 7. The PvAtf,,^, N-terminal activation domain is functional in bean 
cotyledon cells. 

Analysis of chloramphenicol acetyl transferase (CAT) activity in immature 
bean cotyledons co-transfected with CAT reporter genes GAL4, 17 . mwl) : 
TATA-CAT or TATA-CAT and plasmids pJIT82 (+JIT), GAL4::PvAlf, 
GAL4::PvAlf (1 _ 2 43) and GAL4 DNA-binding domain (GAL40B). 

(a) Autoradiographs of TLC separations of [ 14 C]Cam and ( 14 C] Ac-Cam. 

(b) Plot of CAT activity (pmol of Ac-Cam produced per h) calculated from 
TLC chromatography using a Phosphorimager (Molecular Dynamics). Bars 
correspond to the mean ± SE. Numbers in boldface type indicate fold 
enhancement over the control (+ JIT). 

VP16 (Regier et al, 1993). The corresponding N-terminal 
region of A. thaliana ABI3 <AtABI3) and a mutant VP16 
proteiri,~VP16ANC413-429, that retains most of the tran- 
scriptional activation capacity of VP16 (Triezenberg et al„ 
1988) are also shown. All seven proteins share a central 
subdomain highly enriched in serine, threonine and proline 
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Figure 8. The acidic activation domains of plant VAIf proteins. 

The amino acid sequence of the acidic activation domains of PvAif and Vp1 are compared with a corresponding region in domain I of ABI3 (AtABI3>, with 
the acidic domains of yeast transcription factors HAP4 and GCN4, and with the C-terminal acidic domain of VP16. VP16ANC41 3-429 is a mutant VP16 product 
lacking amino acids 413-429 that activates transcription at nearly wild-type levels. Acidic residues (glutamic and aspartic acid) are marked with circled '-' 
signs. Clusters of serine, threonine and proline residues (STP clusters) are highlighted with stippled boxes. Bulky hydrophobic amino acids in the acidic 
subdomains are shown enclosed in rectangles. 



(STP clusters, stippled boxes) surrounded by two acidic 
(negatively charged) regions. The sequences shown in 
Figure 8 have been arranged in order of decreasing length 
of the central subdomain which varies from approximately 
60 amino acids in PvAlf to only 17 in GCN4. The boundaries 
of this central region are delimited by the two acidic 
subdomains. Secondary structure prediction by the Chou 
and Fassman method (Chou and Fassman, 1974) suggests 
that the STP clusters confer upon the central domain a 
high propensity to form (J-turns. The importance of the 
acidic subdomains rich in glutamic and aspartic acid can 
be deduced from their similarity to acidic activation 
domains of GCN4 (Hope and Struhl, 1986) and VP16 (Cress 
and Triezenberg, 1991). In these factors, bulky hydrophobic 
amino acids (phenylalanine, tyrosine and valine) of the 
acidic subdomains are critical for transcriptional activation. 
Similar hydrophobic residues are also present in the acidic 
subdomains of plant VAIfs (square symbols). Although 
initially (Hope and Struhl, 1986) the activation domain of 
GCN4 was localized to the acidic region on the C-terminal 
side of the central STP domain (Figure 8), recent work 
has demonstrated that GCN4 contains a second acidic 
subdomain on the N-terminal side of the STP cluster that 
is also active in transcription (Drysdale et a/., 1995). In the 
case of PvAlf, a 10-fold increase in reporter gene expression 
resulted from adding the 1-117 peptide that contains the 
N-terminal acidic subdomain and the STP cluster. 
Approximately the same 10-fold increase resulted from 
adding the C-terminal acidic subdomain included in the 
118-243 peptide. This is consistent with the presence of 
two acidic subdomains on either side of the STP cluster 
contributing to the overall activity of the PvAlf activation 
domain. By contrast, the significance of the central STP 
cluster remains unexplored in plants and yeast. 



Discussion 

Gene PvAlf encodes a large, 756 amino acid protein similar 
to the late embryogenesis regulatory factors ABI3 and Vp1 
of Arabidopsis and maize; together with OsVpl (the rice 
equivalent of VpD these four proteins compose a class, 
designated here as VAIf. A computer assisted alignment 
of VAIf primary amino acid sequences (Figure 1) extends 
the observation made first by Giraudat et al. (1992) and 
later by Hattori et al. (1994), that the similarity among 
these proteins is confined to four domains (I— IV). The 
conservation of VAIf domains WV contrasts with the widely 
divergent sequences of intervening segments that separate 
those domains. Such alternation of high sequence 
similarity and divergence constitutes strong evidence that 
VAIf proteins consist of several structural modules. A 
modular organization is common among transcription fac- 
tors whose various functions such as DNA-binding, activa- 
tion, protein:protein contacts, and small ligand binding are 
segregated into distinct and largely independent globular 
domains. At the same time, regions of low sequence 
similarity could represent functional specializations 
reflecting the different cellular environments in which each 
VAIf protein must perform. New procedures for legume 
transformation now offer a unique opportunity to address 
these questions by comparing the functions of hybrid 
ABI3-PvAlf proteins in transgenic Arabidopsis and bean 
(or soybean) plants. The widespread occurrence of VAIf 
proteins also suggests that they are essential for the normal 
development of plant species exhibiting a maturation 
phase of embryogeny, and raises the question of whether 
species that do not undergo embryonic maturation, such 
as ferns, lack functional VAIf genes. 

Homozygous abi3 and vp1 seeds fail to accumulate 
maturation-specific products in endosperm and embryonic 
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cells (McCarty ef a/., 1991; Nambara et al., 1992, 1994; 
Parcy ef al., 1994) and evidence that the products of these 
two loci are active in transcription continues to mount. For 
instance, recombinant Vp1 and OsVpl proteins transiently 
expressed in electroporated cell protoplasts enhance the 
activity of maize Em and C1 promoters (Hattori et at., 1992, 
1994; McCarty ef al., 1991), and ectopic expression of ABI3 
in transgenic seedlings leads to activation of maturation 
and late embryogenesis markers in leaves. We found that 
a recombinant PvAlf gene also activates gene expression 
from the promoters of two bean maturation-specific 
markers, phaseolin and PHA-L, in embryonic and non- 
embryonic cell types. Preliminary results suggest that the 
PvAlf effector plasmid pJIT-Alf may also induce expression 
of phaseolin mRNA in bean leaf cells. The fact that ABI3 
and PvAlf proteins seem to function in leaves is very 
significant, since neither gene is normally expressed out- 
side of the seed environment (Figure 3 in this paper and 
Giraudat ef a/., 1992). This indicates that no other seed- 
specific factors may be needed fortranscriptional activation 
by PvAlf or ABI3. Alternatively, a factor present in leaves 
may be able to substitute partially or completely for seed- 
specific proteins that normally assist ABI3 and PvAlf func- 
tion in the embryos. For instance, different R genes that 
regulate anthocyanin biosynthesis in maize are expressed 
with distinct tissue specificities (e.g. S gene in aleurone 
versus P gene in anther and coleoptile or Lc gene in midrib, 
pericarp and other tissues) but encode transcription factors 
with essentially the same function (Ludwig ef a/., 1989). 

The time course of phaseolin, PHA-L and PvAlf mRNA 
accumulation in developing bean embryos raises several 
interesting questions for future investigation. Phaseolin 
and lectin genes are coordinately regulated between 10 
and 27 DAF. Their induction several days after the initial 
increase in PvAlf mRNA expression is consistent with their 
position downstream of PvAlf in a control hierarchy. This, 
and the fact that PvAlf is sufficient to activate de novo 
GUS expression from phaseolin and PHA-L promoters 
support a role for the PvAlf protein as a positive regulator 
of maturation-specific gene expression in bean. More 
definitive proof will come from immunological detection 
of PvAlf protein in embryonic tissues using specific antibod- 
ies, and from the use of an antisense RNA approach to 
suppress PvAlf synthesis in transgenic embryos. Presum- 
ably, PvAlf activates gene transcription by interacting, 
directly or via a facilitator protein, with specific c/s-acting 
motifs of phaseolin and PHA-L promoters. The only obvious 
similarities between the phaseolin and PHA-L upstream 
sequences present in reporter constructs Phaseolin p -GUS 
and PHA/35S P -GUS are three instances of the motif 
5 CATGCAY 3 ', homologous to RY-repeats of many legume 
seed-expressed genes (Dickinson ef a/., 1992; Hoffman and 
Donaldson, 1985; Thomas, 1993), and several occurrences 
of the sequence 5 G A / C CAC G / C TCA 3 '. The latter binds to 



two basic leucine-zipper (bZIP) proteins, PvSF1 and VBP1, 
recently cloned from bean embryos in our laboratory 
(Chern ef a/., submitted). Phaseolin and PHA-L RY-repeats 
are similar to the Sph1 element required for Vp1-mediated 
activation of the C1 promoter in maize cell protoplasts 
(Hattori et al., 1992). We are currently investigating the 
possibility that the RY-repeats mediate transcriptional 
activation by PvAlf. Experiments are also underway to 
determine any possible interactions of PvAlf with cloned 
bZIP factors that bind to both promoters, or with other 
transcription factors and c/s-acting elements that may also 
regulate these promoters. 

Phaseolin and PHA-L mRNAs decrease in abundance 
and eventually disappear between 22 and 35 DAF while 
PvAlf continues to be expressed. A similar phenomenon 
has been described in Arabidopsis (Parcy ef a/., 1994) where 
ABI3-dependent, maturation genes such as cruciferin, are 
downregulated late in embryogenesis although the steady- 
state level of ABI3 mRNA remains constant. This pattern 
is also independent of the concentration of endogenous 
ABA in embryos. These observations could be explained 
by the presence of negative regulatory factors that repress 
MAT gene expression before abscision and desiccation. 
Run-off transcription experiments carried out with 
phaseolin, P-conglycinin and Kunitz trypsin inhibitor (Kti) 
genes have indicated that repression of maturation-specific 
genes may occur both at the transcriptional and post- 
transcriptional levels (Barker ef a/., 1988; Chappell and 
Chrispeels, 1986; Jofuku and Goldberg, 1989; Walling ef al., 
1986). Apparently, ABI3-dependent late embryogenesis 
abundant {LEA) genes (Parcy ef al., 1994) somehow remain 
impervious to this repression and continue to be expressed 
well into the desiccation stage (Finkelstein, 1993; Hughes 
and Galau, 1991). 

The functional importance of acidic, amphipathic a- 
helices within the Vp1 transcriptional activation domain 
has been suggested by McCarty ef al. (1991) based on 
their similarity with acidic sequences in other eukaryotic 
transcription factors (Giniger and Ptashne, 1987). More 
recently, the importance of a-helices in these domains has 
been called into question (Cress and Triezenberg, 1991; 
Van Hoy ef al., 1993), and now it appears that certain 
bulky hydrophobic residues embedded within a negatively 
charged structure are what dictates their activity. A very 
conspicuous feature of all four plant VAIf activation 
domains are the large serine-threonine-proline rich (STP) 
clusters that separate the acidic subdomains (Figure 8). 
These clusters resemble degradation sequences of plasma 
membrane proteins whose half-life is regulated by O-linked 
glycosylation (Kozarski ef al., 1988). O-linked GlucNac side- 
chains have been demonstrated on many nuclear proteins 
(reviewed by Hart ef al., 1989) and near the activation 
regions of animal transcription factors (Jackson and Tjian, 
1988). Another attractive possibility is that the STP clusters 
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represent targets for serine kinases similar to glycogen 
synthase kinase-3 of animal ceils (Kemp and Pearson, 
1990). The ABA-insensitive-1 (AB/1) gene of Arabidopsis 
encodes a novel type of Ca +2 -regulated phosphatase 2C 
(Leung et at., 1994; Meyer et ai, 1994) indicating that 
phosphorylation/dephosphorylation reactions play a major 
role in the ABA signal transduction pathway (Bowler and 
Chua, 1994; Rock and Quatrano, 1994). We speculate that 
the STP clusters may have a regulatory role either by 
controlling the half-life of VAIf activation domains, or 
as phosphorylation sites. The PvAlf->ref phaseolin and 
PvAlf->PHA-L regulatory systems described here will allow 
a critical evaluation of these and other hypothetical func- 
tions of VAIfs in dicots. 

Experimental] procedures 
Cloning of PvAlf 

A rapid amplification of cDNA ends (3'-RACE, Frohman et al., 
1988) strategy was used to clone PvAlf. Total RNA was obtained 
from maturing, 5-10 mm long embryos of the common bean 
(cv. Tendergreen') by extraction with guanidinium thiosulfate 
(Sambrook et al., 1989). Poly(A) + RNA was isolated using the 
PolyATtract™ magnetic-bead system (Promega) and cDNA was 
synthesized with Superscript™ Reverse Transcriptase (Qibco 
BRL). This cDNA served as template for all PCR amplifications 
with the exception of those for the 5'-RACE procedure. Two 
degenerate oligonucleotides were synthesized, prALF3'R-1 (5'- 
ATGGARGAYATHGG NAC-3'; 48 permutations) and prALF3'R-2 
(5'-GTNTGGAAYATGMGNTA-3'; 64 permutations). A first 
amplification was carried out with one gene-specific primer 
(prALF3'R-1) and the 3' adapter primer. After 35 cycles, remaining 
primers were eliminated by ultrafiltration, diluted and a second 
amplification was performed with prALF3'R-2 and 3' adapter. The 
products of the second amplification were cloned (TA cloning™, 
Invitrogen) and sequenced. Two nested, specific downstream 
primers were synthesized based on the 3 '-RACE product sequence, 
prALFDWN-1 (5'-GGTTTCACACCTTGTTG-3') and prALFDWN-2 
(5'-GCTGG GTTTTCTGCGAT-3'). A degenerate oligonucleotide, 
prALFUP-1, (5'-CTYCCNGAYTTYCCNTG-3'; 128 permutations) was 
synthesized from a conserved region (LPDFP) near the N-terminus 
of Vpl-like proteins. Nested PCR reactions were performed with 
the degenerate primer prALFUP-1 and either of the two PvAlf- 
specific primers, prALFDWN-1 and prALFDWN-2. A major product 
of the second reaction, 1.8 kb in length, was purified by agarose 
electrophoresis, cloned (TA cloning™) and sequenced to confirm 
its identity. A new primer was synthesized, prALF5'R (5'-AGGAT- 
CG AAG AAATCATTGGC-3' ) and used in the 5'-RACE System 
(GIBCO-BRL) to amplify the 5'-end of PvAlf mRNA using Poly(A) + 
RNA as template. The products of the 5' RACE reaction were 
analyzed by Southern blot hybridization using the 1.8 kb PvAlf 
fragment. The largest cross-hybridizing products were isolated by 
gel electrophoresis and cloned. Recombinant clones were analyzed 
by colony hybridization (Sambrook et ah, 1989) with the modifica- 
tion that colonies were poked through a nitrocellulose filter into 
antibiotic-containing LB agar plates and the filter/plate assembly 
incubated together.. Inserts from hybridizing colonies were 
sequenced to identify those containing the beginning of the PvAlf 
open reading frame. 
Based on the collected sequences of 3' and 5'-RACE products, 



oligonucleotide primers prAhV (5'-CCGTCGACGCAAAGATGG- 
AGTGTGAAGTGAAG-3' ) and prAlf3' (5'-CCGATATCCTGTTGACA- 
GCCTCCATTGC-3') were synthesized to permit PCR amplification 
of the entire predicted coding region of PvAlf from cDNA. Two 
independent PCR reactions were performed and their products 
cloned. One of these full-length clones was completely sequenced 
to verify the sequence of PvAlf. 

RNA expression analysis and DNA hybridization 

Various tissues of Phaseolus vulgariswere frozen in liquid nitrogen 
and total cellular RNA was extracted by a hot phenol method 
(Meier et al., 1993). The embryos were separated from seed 
coats and endosperm before freezing. RNAs were denatured 
with formaldehyde, resolved by agarose gel electrophoresis and 
transferred to nylon membranes (Nytran™, S&S). The membranes 
were pretreated in hybridization buffer (1 M NaCI, 10% dextran 
sulfate, 1% SOS, 150 ug denatured salmon sperm DNA) for 2 nr. 
at 65°C and hybridized in the same solution to a 1.8 kb PvAlf 
probe labeled with 32 P-dCTP by the random primed method (USB). 
After washing (twice 30 min in 2x SSC, 1% SDS followed by 
twice 30 min in 0.2X SSC, 1% SDS at 60°C), the biots were 
auto radiographed using Kodak X-Omat X-ray film. Blots were 
stripped, and re-hybridized to similarly prepared probes for 
phaseolin, PHA-L and 18S rRNA Total genomic DNA was isolated 
from etiolated bean seedlings by the CTAB method (Taylor and 
Powell, 1982), separated by electrophoresis on a 0.75% agarose/ 
TAE gel, immobilized on Nytran™ (S&S) filters by capillary transfer 
and probed with an internal EcoH) restriction fragment of PvAlf 
as above. 

Plasmid constructs 

Hybrid genes containing PvAlf full-length or C-terminal truncated 
fragments were fused in-frame with the GAL4 DNA-binding 
doma.n in the yeast expression vector pGBT9 (Clontech). In every 
case, PvAlf sequences were amplified by PCR from a full-length 
cDNA clone (pPvAlf) using synthetic primers containing restriction 
sites. The nucleotide sequences of each primer were as follows: 

PvAlf( 1-756): Alf3'-Smal (TGGCCCGGGGATGGAGTGTGAAGTG- 
AAG) and Alf5'-Sa/I (GGGGTCGACTAGTTTCGGTGCGATGAC); 

PvAlf(1-117): Alf1-EcoRI (GGGGAATTCATGGAGTGTGAAGTGA- 
AG) and Alf2-BamHI (GGGGGATCCCTTCAAGACGGCCCGTG); 

PvAlf(1-243): Alf5'-Safl (GGGGTCGACTAGTTTCGGTGCGATGAC) 
and Alf6-Sa/l (GGGGTCGACCTCTCCTTTGATAACTTGAC). 

All PCR products were first cloned in the TA™ cloning vector 
pCR2000 (Invitrogen), then excised with appropriate restriction 
enzymes and ligated into pGBT9. For functional analysis of PvAlf 
fragments in bean the GAL4:PvAlf (1 _n7) and GAL4:PvAlf (1 _243) 
fusions were excised from pGBT9 by digestion with H/ndlll and 
BamHl or H/ndlll and Saf\, and subcloned downstream of the 
CaMV 35S promoter in vector pJIT82 (a gift from Professor Donald 
Helinski, Department of Biology, University of California, San 
Diego, La Jolla, CA 92083, USA). The control plasmid pGAL4DB:J IT 
coding for the GAL4 DNA-binding domain was made by excising 
the PvAlf fragment of pGAL4:PvAlf (1 .ii7)/JIT with EcoRI and religat- 
ing the remaining plasmid with T4 DNA ligase. 

Yeast transformation and fi-galactosidase assays 

Functional assays were conducted in yeast strain SFY526 (Bartel 
et ai, 1993) that has a lacZ reporter gene driven by a GAL1 
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promoter and transformation markers trp1-901 and leu2-3. Yeast 
cells were transformed with expression vectors carrying different 
GAL4:PvALF fusions by the Li-acetate method as described by Ito 
etal. (1983) and modified by Schiesti and Gietz (1989), Hill et al. 
(1991), and Gietz etal. (1992). Quantitative p-galactosidase assays 
were performed by growing individual colonies in minimal 
medium containing 2% dextrose to mid-log phase (ODgoo - 1.0). 
Cell culture (0.1 ml) was added to 0.7 ml Z-buffer (60 mM 
Na 2 HP0 4 .7H 2 0, 40 mM NaHjPO^O, 10 mM KCl, 1 mM 
MgS0 4 .7H 2 0 at pH 7.0 and 40 mM ^-mercaptoethanol) and the 
cells lysed by the addition of 50 ul CHCI 3 and 50 jxl 0.1% SDS. 
After the addition of 0.16 ml o-nitrophenylgalactoside solution 
(4 mg ml* 1 in 0.1 M phosphate buffer at pH 7.0), the samples were 
incubated for 1 h at 37°C. Reactions were stopped with 0.4 ml 
Na 2 C0 3 and cell debris removed by centrifugation at 13 000 g for 
10 min. The absorbance at 420 nm was determined and fJ- 
galactosidase activity expressed as in Miller (1972). Filter assays 
for qualitative p-galactosidase activity detection were performed 
by placing a sterile Whatman #1 filter on top of agar plates 
containing the transformed colonies which were subsequently 
submerged for 10 sec in liquid nitrogen to lyse the cells. The 
filters were placed on top of Whatman filters that had been 
presoaked in 1.8 ml Z-buffer containing X-gal at 0.33 mg ml" 1 . 
The filters were incubated at 30°C until blue color developed. 

Particle bombardment of bean cotyledons and 
measurement of ^-glucuronidase (GUS) and 
chloramphenicol acetyl transferase (CAT) activity 

Cotyledons from immature bean seeds (~ 17 days after flowering) 
were sliced longitudinally and placed on solid Gamborgbs G5 
medium containing 0.8% agar, 3% sucrose and 0.75 M mannitol. 
Leaf discs (1 inch in diameter) were cut from mature bean leaves 
taking care to avoid major veins and placed on the same medium. 
Tungsten (1.7 u.m $) or gold (1.6 Jim $) microcarriers (0.2 mg per 
bombardment) were co-precipitated with equal amounts (0.5 \iQ 
per bombardment) of reporter and effector or reporter and vector 
(pJ!T82) plasmids. Microcarriers were spotted on to macrocarriers 
and delivered on to the surface of the tissues with a He-driven 
particle gun (BioRad) at 1550 psi. Bombarded tissues were incub- 
ated for 1-2 days at 28°C in the dark and then ground with a 
mortar and pestle in 400 u.l of GUS extraction buffer or 0.25 M 
Tris-HCI at pH 7.5 (for CAT assays). Cell debris was spun down 
for 5 min at 13 000 g. GUS activity was measured as described in 
Jefferson (1987) and CAT activity as in Fromm etal. (1987). 
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Abstract 

MYB proteins constitute a diverse class of DNA-binding proteins of particular importance in transcriptional reg- 
ulation in plants. Members are characterised by having a structurally conserved DNA-binding domain, the MYB 
domain. Different categories of MYB proteins can be identified depending on the number of imperfect repeats of 
the MYB domain they contain. It is likely that single MYB-domain proteins, a class of expanding importance in 
plants, bind DNA in a different way than two-repeat or three-repeat MYB proteins, and these groups are therefore 
likely to have different functions. The two-repeat (R2R3) MYB family is the largest family characterised in plants, 
and there are estimated to be over 100 members in Arabidopsis. Functions of MYB proteins in plants include 
regulation of secondary metabolism, control of cellular morphogenesis and regulation of meristem formation 
and the cell cycle. Although functional similarities exist between R2R3 MYB proteins that are closely related 
structurally, there are significant differences in the ways very similar proteins function in different species and also 
within the same organism. Therefore, despite the large number of R2R3 MYB proteins in plants, it is unlikely that 
many are precisely redundant in their functions, but more likely that they share overlapping functions. 



Introduction 

When the first plant regulatory gene was sequenced, 
the CI gene of maize, it was recognised to encode 
a transcription factor from its similarity to the rel- 
atively well characterised mammalian transcription 
factor c-MYB (Paz- Ares et a/., 1987). Since then the 
number of proteins with sequence similarity to the 
MYB domain has increased enormously, and it has 
been recognised that transcriptional control working 
through MYB-related transcription factors is particu- 
larly important in plants (Martin and Paz- Ares, 1997; 
Romero etal, 1998). 

What is a MYB-related protein? 

The MYB domain is a region of about 52 amino acids 
that binds DNA in a sequence-specific manner. In c- 
MYB (historically the prototypic MYB protein) this 
domain is repeated three times (Rl, R2 and R3; Fig- 



ure 1) and each imperfect repeat adopts a helix-helix- 
turn-helix conformation to intercalate in the major 
groove of the target DNA. In plants the predominant 
family of MYB proteins have two repeats (R2, R3 
relative to the repeats of c-MYB; Figure 1). In addi- 
tion, three-repeat MYBs, closely related to c-MYB, 
have recently been identified in plants (Ito, in press) 
together with a growing number of MYB proteins with 
a single MYB domain (Figure 1). 

Diversity in the interaction between MYB proteins 
and DNA 

Structural studies of c-MYB have shown that both R2 
and R3 are required for sequence-specific binding, the 
C-terminal helix of each repeat being the recognition 
helix for DNA binding. Regularly spaced tryptophan 
residues (three per repeat) participate in a hydropho- 
bic cluster. It has been suggested that the recognition 
helix of R3 specifically interacts with the core of the 
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Figure 1. Schematic showing functional domains of prototypic 
MYB proteins; three MYB repeats (c-MYB), two MYB repeats 
(CI) and one MYB domain (StMYBl). 



recognition sequence, whereas the recognition helix of 
R2 is involved in less specific interactions with nu- 
cleotides peripheral to the core (Ogata et al. y 1995). 
R2R3 repeat MYBs are believed to bind DNA in a 
similar way. 

Because in c-MYB both R2 and R3 are necessary 
for binding DNA, it is likely that proteins with single 
MYB domains bind DNA in a different manner. Struc- 
tural studies on the human telomeric protein, hTRFl, 
which contains a single MYB domain, suggest that the 
C-terminal helix is longer than the equivalent helices 
in the repeats of c-MYB and, consequently, it does 
not interact with DNA in the same way. In hTRFl it is 
proposed that the protein binds DNA in a manner anal- 
ogous to homeodomain proteins, whose DNA-binding 
domains also form helix-turn-helix motifs (Nishikawa 
etal, 1998). 

It remains to be seen whether other single MYB- 
domain proteins bind DNA in a way related to hTRFl, 
although generally the recognition helices in single 
MYB domain proteins are not particularly similar 
to those of c-MYB (relative to their conservation in 
R2, R3 MYB proteins). It may be that single MYB- 
domain proteins generally bind DNA in a manner 
similar to homeodomain proteins and as dimers (either 
hetero- or homo-dimers), which may have important 
repercussions for their modes of action and biological 
functions. 



Therefore, members of the superfamily of MYB 
proteins should be viewed as related principally by 
their ability to bind DNA, rather than on the basis of 
their physiological functions. Even given the attribute 
of DNA binding, there may be major differences in 
the ways MYB proteins bind DNA. This means that 
there are different target recognition sites for dif- 
ferent groups of MYB proteins, not only between 
single-domain MYBs and two/three-repeat MYBs, 
but also within these groupings. Mammalian three- 
repeat MYBs such as c-MYB, A-MYB and B-MYB 
and closely related proteins from invertebrates and 
cellular slime moulds all bind to the cognate site 
T/CAACG/TGA/OTA/C/T (MBSI). Some plant two- 
repeat proteins can recognise this binding site while 
others cannot. Some of those plant MYB proteins that 
bind to MBSI will also bind to a second site, TAAC- 
TAAC (MBSII), which is a sequence recognised by 
the majority of plant R2R3 MYB proteins (Romero 
et ai y 1998). The group of plant R2R3 MYB proteins 
that bind preferentially to MBSI (group A) are also 
more closely related in the primary structure of their 
DNA-binding domains to the c-MYB family. 

Therefore, broad distinctions in target site recogni- 
tion can be made between MYB proteins on the basis 
of the structure of their DNA-binding domains, which 
fall into distinct structural subgroups. However, within 
the plant R2R3 repeat MYBs, overlaps in binding site 
recognition have also been reported between members 
of the different subgroups, and it is likely that most 
MYB DNA-binding domains have considerable inher- 
ent flexibility in their ability to recognise target sites. 
The binding site preference and affinity of MYB pro- 
teins is also likely to be strongly influenced by other 
protein factors that interact with them. In terms of 
function, these generalisations mean that MYB pro- 
teins belonging to different structural sub-groups are 
unlikely to have similar functions because of differ- 
ences in their preferred binding sites. It does not mean 
the converse, however, i.e. that strong similarity be- 
tween the DNA-binding domains of MYB proteins im- 
plies a commonality of function. Flexibility in recog- 
nition, operating through a variety of mechanisms, 
may mean that proteins very similar in their DNA- 
binding domains control quite different target genes 
and therefore have quite distinct physiological func- 
tions. For example, within the large family of R2R3 
MYB proteins identified in plants three major subdivi- 
sions can be made on the basis of the sequence of the 
DNA-binding domain: subgroup A (whose members 
are most similar to c-MYB and other animal MYB 



proteins), subgroup B which is a relatively small group 
(4 members in Arabidopsis) and subgroup C which 
encompasses 70 members in Arabidopsis (from a total 
of 83 defined so far), several members of which have 
been shown to recognise the MBSIIG binding site 
(T/CACCA/TAC/AC) preferentially (Romero et al, 
1998). Members of subgroup C include the Arabidop- 
sis gene AtMYBGLl which is involved in trichome 
specification and which may be required for promot- 
ing endoreduplication and increasing cell size, the 
Antirrhinum gene AmMYBMlXTA which is involved in 
specifying the formation of conical cells in petal epi- 
dermis and which plays no role in cellular outgrowth 
in Arabidopsis and the maize genes ZmMYBCl, Zm- 
MYBPL and ZmMYBP, the Petunia gene PhMYBANl 
and the Antirrhinum genes Am-MYBROSEA and Am- 
MYBVENOSA which are all involved in regulating 
anthocyanin production. Despite having very simi- 
lar DNA-binding domains, these subgroup C proteins 
clearly have distinct physiological functions. 

Evolution of MYB DNA-binding domains 

The diversity in organisation of MYB domains in 
proteins is understood best from an evolutionary per- 
spective. An attractive model for evolution of MYB 
proteins has been presented by Lipsick (1996). This 
model is based on early (over 1 billion years ago) 
duplication of the MYB domain to give multiple- 
repeat MYB proteins followed by later expansion of 
MYB proteins through duplication of entire genes. 
This expansion was relatively limited for the three- 
repeat MYB proteins in animals (vertebrates having 
just three closely related members, c-MYB, A-MYB 
and B-MYB). However, in plants the expansion of 
the R2R3 family was considerable, and there are es- 
timated to be over 100 members of this subfamily in 
Arabidopsis, and equivalent or larger numbers in other 
plant species. 

The Lipsick model for the evolution of MYB re- 
peats remains consistent with new sequence informa- 
tion derived since 1996 (Figure 2). However, it has 
become apparent that plants have retained genes en- 
coding three-repeat MYBs (Ito, in press) in addition to 
expanding the R2R3 family (which is believed to have 
been amplified after loss of Rl from a three-repeat 
ancestor; Lipsick, 1996). 

Interestingly, the function of three-repeat MYBs 
from plants may also have been conserved since they 
are thought to be involved in cell cycle control, reg- 
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ulating the expression of cyclins (Ito, in press). In 
animals, c-MYB, A-MYB and B-MYB are associ- 
ated with the promotion of cellular proliferation which 
may operate through control of the cell cycle. The 
three-repeat MYBs in plants may represent descen- 
dants of an ancient member whose structure has been 
conserved during the evolution of plants and animals, 
and whose function may be relatively similar in these 
different kingdoms. 

The one MYB domain family has also been ex- 
panded in plants; adding to the list including StMYBl, 
(the first member) are two single-domain MYB pro- 
teins from Arabidopsis (CCA1 and LHY) believed to 
operate as oscillators close to or part of the circadian 
clock governing flowering, leaf movements, photosyn- 
thetic gene expression and hypercotyl growth. Two 
other related single MYB domain proteins have also 
been identified in Arabidopsis but their functions have 
not been characterised. In addition, the CAPRICE 
gene (CPQ encodes a single-domain MYB protein 
with a role in root hair formation in Arabidopsis. 

The family of one MYB domain proteins, which 
recognise G-rich telomeric sequences (TBFs), have 
been conserved during the evolution of yeasts, animals 
and plants (Bilaud et al, 1996). This family of MYB 
proteins may also be able to control transcription; the 
plant members of the family have been shown to work 
as transcriptional activators. 

Two-repeat MYBs closely related structurally to 
the product of the Schizosaccharomyces pombe gene, 
cdc5, have been conserved in fungi (present in 
5. pombe and Saccharomyces cerevisiae), animals and 
plants (Ohi et aL, 1998), and their function is also 
thought to have been conserved (regulation of G2/M 
progression in the cell cycle). These cdc5p-related 
proteins contain a third region just C-terminal to the 
second MYB repeat which has very weak similar- 
ity to classic MYB repeats (it lacks the regularly 
spaced tryptophan residues, for example) which could 
represent a highly diverged third repeat. 

Another MYB (DMP1), conserved in animals and 
yeast, binds to D cyclins and is thought to regulate the 
cell cycle. This is a three-repeat MYB with strongest 
similarity in its repeats to R1/R2 of c-MYB. The third 
repeat shows little conservation of structure with the 
classic c-MYB R3 domain except in maintaining the 
regularly spaced tryptophan residues. In addition to 
binding to D cyclins, these MYB proteins can also 
activate transcription (Hirai and Sherr, 1996). 

From studies of amino acid homologies the Lipsick 
model suggested that the R2R3 MYB-related proteins 
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Figure 2. Model for the evolution of M YB proteins in the major eucaryotic kingdoms; slime moulds, animals, plants and fungi (modified from 
Lipsick, 1996). It is proposed that a single original MYB repeat was replicated (r) to give rise to two- and three-repeat MYB proteins. The 
majority of plant MYB proteins are thought to have evolved following loss (del) of the first repeat (Rl) to give the expanded R2R3 family. 
Different subgroups within this large family are thought to have arisen by duplication of entire genes (d). Three-repeat MYBs such as MSA BP 
from tobacco have been maintained in plants (Ito, in press). Other proteins such as the cell cycle regulator Spcdc5 and the telomere-binding 
proteins have been conserved in animals (Hscdc5 and hTIRFl, respectively), plants (AtcdcS and IBP1/BPF, respectively) and fungi (Spcdc5 
and TBP, respectively), and it seems likely that their functions have been conserved as well. (Hs, Homo sapiens', Dm, Drosophila melanogaster, 
Dd, Dictostylium discoides; An, Aspergillus nidulatis; Sp, Schizosaccharomyces pombe; Sc, Saccharomyces cerevisiae; St, Solatium tuberosum, 
Am, Antirrhinum majus; Zm, Zea mays; At, Arabidopsis thaliana). 



arose after loss of the sequences encoding Rl in an 
ancestral three-repeat MYB gene. It is likely that, upon 
loss of Rl , several subgroups of genes encoding R2R3 
MYB proteins went through selective amplification 
and subgroup expansion during plant evolution (Lip- 
sick, 1996; Romero et al. y 1998; Figure 2). More 
than 80 Arabidopsis genes encoding R2R3 MYBs 
have been specifically cloned and analysed (Romero 



et ai, 1998). This analysis identified three distinct 
R2R3 subgroups on the basis of the structure of part 
of their DNA-binding domain. It seems likely that the 
three Arabidopsis R2R3 MYB subgroups (A, B, C) 
were derived from different R2R3 ancestors. In ad- 
dition, the family of R2R3 MYBs characterised by 
PHANTASTICA (AmMYBPHAN) from Antirrhinum 
majus and ROUGHSHEATH 2 (ZmMYBRS2) from 



maize represent an independent subgroup because the 
sequence of the recognition helix in R3 (which is in- 
volved in making base contacts) is very different in 
AmMYBPHAN and AmMYBRS2 to that found in c- 
MYB or other R2R3 MYBs of plants (Timmermans 
et al , 1 999). The AmMYBPHAN subgroup of MYBs 
has been implicated in meristem initiation and control 
of the dorso- ventral axis in shoot organs (Table 1). 

Diversity within the C-terminus of plant MYBs 

Most M YB proteins are presumed to be transcriptional 
activators with activation domains in the region C- 
terminal to the DNA-binding domain, since c-MYB 
has an activation domain C-terminal to its DNA- 
binding domain, which is acidic (Weston, 1998). A 
few plant MYBs have been tested as transcriptional 
activators, and most will activate transcription, the 
activation domain generally being predicted to form 
an amphipathic a-helix. However sequences in the 
C-terminal regions of MYB proteins are not strongly 
conserved, presumably because the structural determi- 
nants for activation domains are fairly flexible. 

Not all MYB -related proteins need be transcrip- 
tional activators. There is circumstantial evidence that 
some may serve to reduce target gene expression, al- 
though no repression domains have yet been defined. 
In principle, silencing of target gene expression can 
also be achieved by competition for DNA-binding 
sites with other transcriptional activators or by bind- 
ing and titrating out other activators themselves so 
that activation of transcription is reduced. There are 
currently no data on the biochemical action of plant 
MYB proteins with respect to the repression of gene 
transcription. 

A survey of the predicted C-terminal sequences 
of the Arabidopsis R2R3 MYB family revealed 22 
different subgroups which showed limited sequence 
conservation within their C-terminal regions (Kranz 
et al. f 1998). These conserved motifs might represent 
activation domains, an idea supported by the fact that 
some are relatively acidic, and that others are rich in 
amino acids frequently associated with activation do- 
mains (glutamine, proline); the sequence requirements 
for activation domains are known to be flexible. Alter- 
natively, these regions of sequence conservation may 
represent repression domains or domains for interac- 
tion with other transcription factors, although in the 
only case of plant MYB protein interaction that has 
been reported, that of the maize protein ZmMYBCl 
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with bHLH factors B, R, Lc and Sn, the interaction 
is believed to involve the DNA-binding domain of the 
MYB protein (Martin and Paz- Ares, 1997). 

Multifunctionality of plant MYB proteins 

Interest in the diversity of MYB proteins in plants 
clearly stems from an ultimate interest in their func- 
tionality. Faced with the plethora of MYB genes, can 
conclusions be drawn about their likely functions on 
the basis of their primary amino acid sequences? In 
addition, why do plants have so many R2R3 repeat 
MYBs and how does the individual activity of each 
member relate to the activity of other members? 

Considering all MYB-related proteins, it is clear 
that there is a wide diversity of function. For example, 
the primary biological role of the MYB proteins that 
bind telomeric sequences may be structural, and their 
role as transcription factors may be secondary. Other 
single MYB domain proteins clearly function primar- 
ily as transcription factors; the biochemical functions 
of some of these are related to the rhythmic changes in 
gene expression associated with the circadian clock, 
while another has a role in root hair formation. 

The R2R3 MYB family in plants is large and its 
functions are diverse (Table 1). The only uniting fea- 
ture is that most members of the family seem to be 
involved in 'plant-specific processes' involving con- 
trol of secondary metabolism or response to secondary 
metabolites unique to plants or cellular morphogen- 
esis unique to plants (Martin and Paz- Ares, 1997). 
Although this generalisation may serve to address the 
question why there was wide expansion of the R2R3 
MYB gene family in plants it does not help very much 
in predicting the function of the, as yet, uncharac- 
terised members. Ultimately, our understanding of the 
range of influence of MYB-related proteins in plants 
will depend on mutational analysis of each, a task 
that is currently underway. However, it is still per- 
tinent to ask to what extent homologous functions 
can be assigned to structurally related MYB proteins 
from different plant species, and what is the extent of 
redundancy in MYB gene activity. 

Are structurally similar MYB proteins 
functionally homologous? 

The best way to consider whether structurally related 
MYB proteins from different species share homolo- 
gous functions is to consider the best characterised 
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Table 1. List of plant MYB-related genes for which function has been assigned. The subgroup for the DNA-binding domain of the 
R2R3 gene is also listed according to Romero et al. (1998). 



MYB genes 



Biological functions 



Species 



R2R3 subgroup 
(Romeros a/., 1998) 



One-repeat Myb 

StMYBl 

LHY 

CCA1 

PcMYBl 

CPC1 

BPF1 

IBP1 

R2R3 Myb 

ZmMYBCl 

ZmMYBPL 

ZmMYBl 

ZmMYB38 

PhMYBAN2 

PhMYB3 

AmMYB305, 340 

PsMYB26 

ZmMYBP 

AmMYB308, 330 

AtMYBGLl 

AmMYBMIXTA 

PhMYBl 

CotMYBA 

AmMYBPHAN 

ZmMYBRS2 

AtMYB13 

AtMYB103 

GAMYB 
AUMYB2 
ATR1 

Cpm5, Cpm7, CpmlO 
NtMYBl 

AtCDCS 

R1RR2R3 Myb 

MSA-binding proteins 

AtF4D11.7 

AtF6N23.19 



Unknown 

Circadian clock regulation, flowering time 
Phytochrome & circadian regulation 
Light-dependent activation 
Epidermal cell differentiation, root hairs 
Telomeric DNA binding protein 
Telomeric DNA binding protein 

Phenylpropanoid metabolism 
Anthocyanin 
Anthocyanin 
Anthocyanin 

Inhibition of CI -mediated activation 

Anthocyanin 

Anthocyanin 

Anthocyanin and flavonol 
Phenylpropanoid regulation 
Phlobaphene 
Phenolic acid 

Development 
Trichome development 
Conical cell development 
Conical cell development 
Trichome development 
Dorsoventral determination & growth 
PHAN-like, repress knox expression 
Shoot morphogenesis 
Expressed in developing anthers 

Signal transduction 
Gibberellin response 
Dehydration and ABA regulation 
Tryptophan biosynthesis 
Dehydration and ABA response 

Plant disease resistance 
TMV, SA-inducible 

Cell division 

Cell cycle regulation 

Regulation of B-type cyclin genes 

Unknown 

Unknown 



Solatium tuberosum 
Arabidopsis thaiiana 
Arabidopsis thaiiana 
Petroselinum crispum 
Arabidopsis thaiiana 
Petroselinum crispum 
Zea mays 



Zea mays 
Zea mays 
Zea mays 
Zea mays 
Petunia hybrida 
Petunia hybrida 
Antirrhinum majus 
Pisum sativum 
Zea mays 

Antirrhinum majus 

Arabidopsis thatiana 
Antirrhinum majus 
Petunia hybrida 
Gossypium hirsutum 
Antirrhinum majus 
Zea mays 

Arabidopsis thaiiana 
Arabidopsis thaiiana 

Hordeum vulgare 
Arabidopsis thaiiana 
Arabidopsis thaiiana 
Craterostigma plantagineum 

Nicotiana tabacum 

Arabidopsis thaiiana 

Nicotiana tabacum 
Arabidopsis thaiiana 
Arabidopsis thaiiana 



Subgroup C 
Subgroup C 
Subgroup C 
Subgroup C 
Subgroup C 
Subgroup C 
Subgroup C 
Subgroup C 
Subgroup C 
Subgroup C 

Subgroup C 
Subgroup C 
Subgroup C 
Subgroup C 

AmMYBPHAN subgroup 
AmMYBPHAN subgroup 
Subgroup C 
Subgroup C 

Subgroup B 
Subgroup C 
Subgroup C 



subfamily of R2R3 MYBs, those controlling an- 
thocyanin biosynthesis: ZmMYBCl and ZmMYBPL 
from maize, PhMYBAN2 from Petunia and Am- 
MYB ROSEA from Antirrhinum. The proteins of these 
genes are structurally related, especially in their DNA- 
binding domains which are very similar, but also in 
their C-terminal sequences. The obvious conclusion 
is that these genes are structurally and functionally 
homologous. However it is known that ZmMYBCl 
(ZmMYBPL) and PhMYBAN2 do not regulate ex- 
actly the same target genes in maize and petunia, in 
that an2 mutants of petunia are not affected in their 
expression of some structural genes of anthocyanin 
biosynthesis, whereas cl mutants in maize show re- 
duced expression of all the structural genes (Martin 
and Paz- Ares, 1997). Rosea mutants of Antirrhinum 
show reduced expression of a different subset of struc- 
tural genes to cl and an2 mutants, demonstrating that 
although the effects of loss of gene function are similar 
in all cases (loss or reduction in pigment production) 
the biochemical functions of each MYB protein are 
not precisely homologous. This suggests that those 
Arabidopsis genes encoding similar MYB products 
{AtMYB75 and AtMYB90) may serve roles in reg- 
ulating anthocyanin biosynthesis but cannot predict 
precisely which target genes will be regulated by them. 

Do MYB genes show extensive redundancy in their 
functions? 

The degree of genetic redundancy within MYB gene 
subfamilies can also be assessed by considering those 
genes controlling anthocyanin biosynthesis. In maize, 
the MYB genes ZmMYBCl and ZmMYBPL clearly 
have the same function: to activate transcription of 
the structural genes of anthocyanin biosynthesis. The 
two genes work in different plant tissues: ZmMYBCl 
in the aleurone and some tissues of the flowers, Zm- 
MYBPL in the vegetative plant tissues. In this example, 
paralogous genes, which have most likely arisen by 
gene duplication, have adopted different expression 
patterns. It is likely that in other species pigmentation 
patterns may result from similar activities of paral- 
ogous regulatory genes. The maize gene ZmMYBP 
controls phlobaphene biosynthesis in pericarp tissue. 
Phlobaphenes are derived from the flavonoid pathway 
that also gives rise to anthocyanins, and ZmMYBP 
is known to activate some, but not all of the target 
genes of ZmMYBCl, although it is thought to bind 
to sites in the promoters of the structural genes with 
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differing affinity to ZmMYBCl. ZmMYBP is quite 
closely related structurally to ZmMYBCl, suggesting 
that structurally related proteins (particularly if re- 
lated in their DNA-binding domains) perform related 
although not identical functions. 

This general idea is supported by studies on the 
function of some members of R2R3 MYB C-terminal 
subgroup 4 which share very similar DNA-binding do- 
mains and a C-terminal region potentially encoding a 
zinc-finger motif (Kranz etai, 1998). This C-terminal 
subgroup is most closely related to the C-terminal 
subgroups including ZmMYBCl, ZmMYBPL, Zm- 
MYBP and PhMYBAN2 (subgroups 5, 6 and 7; 
Kranz et ai, 1998). It is known that some mem- 
bers of subgroup 4 can regulate expression of genes 
involved in hydroxycinnamic acid metabolism; an- 
other branch of phenylpropanoid metabolism linked 
to flavonoid metabolism by three common steps at the 
start of each pathway. Again structural similarity ap- 
pears to reflect functional similarity, although it is not 
thought that members of subgroup 4 normally regulate 
any structural genes in common with ZmMYBCl or 
PhMYBAN2. 

Even where gene products are closely related struc- 
turally and produced in the same species it cannot 
be assumed that their functions are redundant or that 
they are paralogous. This is most clearly illustrated 
by the vertebrate MYB family c-MYB, A-MYB and 
B-MYB. All three proteins share virtually identical 
DNA-binding domains, and are known to be able 
to bind to the same target DNA sequences. How- 
ever numerous ectopic expression studies and analyses 
of knock-out mutants of mice suggest that the pro- 
teins are not functionally equivalent. C-MYB and 
A-MYB are structurally most similar, sharing a cen- 
tral activation domain, a region for interaction with 
the transcription factor CBP, and a C-terminal neg- 
ative regulatory domain as well as the MYB DNA- 
binding domain. However c-MYB has another motif 
in its C-terminal region for interaction with transcrip- 
tional co-activators, not found in A-MYB. B-MYB has 
a C-terminal region involved in cell-cycle-regulated 
phosphorylation, present in A-MYB but not c-MYB. 
A-MYB is expressed largely in spermatogenic tissue 
whereas c-MYB is expressed in haematopoeietic tis- 
sues and epithelium. It is possible that A-MYB and 
c-MYB, as a result of their structural similarities, are 
paralogous (in an equivalent way to ZmMYBCl and 
ZmMYBPL), regulating cellular proliferation and the 
commitment to cellular differentiation, through es- 
sentially the same target genes, in different tissues. 
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However, the unique features of each protein argue 
against identical functions and, rather, suggest over- 
lapping functions, with some but not all target genes 
in common (Weston, 1998). 

B-MYB is less similar to c-MYB than A-MYB. 
Although B-MYB is expressed generally in tissues it 
cannot compensate for loss of c-MYB or A-MYB ac- 
tivity (knock-outs in mice give phenotypes) arguing 
that it is not functionally homologous to either. Lines 
overex pressing c-MYB and B-MYB also give differ- 
ent phenotypes. B-MYB may regulate the cell cycle 
through control of cell cycle progression from G 1 to S 
whereas A-MYB may be more closely associated with 
the control of meiosis. These details suggest that struc- 
turally related MYB proteins may share related (over- 
lapping) but non-identical functions (Weston, 1998). It 
seems likely that such generalisations are also broadly 
applicable to the MYB subgroups in plants (especially 
within the R2R3 MYB subgroups). 

While the concept of paralogous genes control- 
ling the same functions within a species in different 
tissues provides an attractive explanation for some 
of the duplication of R2R3 MYBs in plants, current 
views on how transcription factors control gene ex- 
pression suggest that differences in expression pattern 
may also be instrumental in dictating differences in 
functionality (Sieweke and Graf, 1998). Transcription 
factors generally interact with other proteins associ- 
ated with transcriptional control, and it is unlikely 
that plant MYBs are exceptions. If they assemble 
independently of their DNA target sites, into multi- 
component complexes, the nature of these complexes 
will depend upon the interacting proteins available in 
any particular cell type. Differences in the make-up of 
these complexes may then result in distinct activities 
on different target promoters. The idea that differ- 
ences in protein-protein interactions (dictated in part 
by differences in gene expression patterns) account for 
functional differences between very similar transcrip- 
tion factors has already been proposed to account for 
functional differences between vertebrate MYB pro- 
teins. It is equally likely that similar considerations 
apply to plant MYB transcription factors. There is ev- 
idence for this in the case of AmMYBMIXTA, which 
normally controls the formation of conical cells in 
petal epidermis. If this gene is ectopically expressed 
under control of the CaM V 35S promoter in tobacco or 
Antirrhinum, it results in the formation of multicellular 
trichomes on leaves, a function it does not normally 
control. This suggests that AmMYBMIXTA can adopt 
novel functions through changes in its expression pat- 



tern, possibly through changing the suite of available 
interacting proteins. It also suggests that the function 
of closely related MYB proteins expressed in appro- 
priate cells might be to regulate multicellular trichome 
formation (Glover et aL, 1998). 

Conclusions 

MYB genes have expanded and diversified their func- 
tions during the evolution of flowering plants, and 
now regulate many different aspects of metabolism 
and development. The range of involvement of MYB 
transcription factors in controlling different aspects 
of plant gene expression is by no means fully char- 
acterised. While it is clear that a significant number 
of these MYB transcription factors are involved in 
the detailed regulation of secondary metabolism (par- 
ticularly phenylpropanoid metabolism), it is unlikely 
that many of the MYB proteins operating within a 
single species are truly redundant in their functions. 
In addition, the difficulty in establishing functional 
homology between structurally homologous proteins 
from different plant species suggests that variations in 
the activity of these regulatory proteins may make a 
major contribution to the variation in traits between 
species. 
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